Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish

نویسندگان

  • Martine Adda-Decker
  • Lori Lamel
  • Natalie D. Snoeren
چکیده

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and has often been viewed as one of Europe’s under-resourced languages. We focus on the acoustic modeling of Luxembourgish. By taking advantage of monolingual acoustic seeds selected from German, French or English model sets via IPA symbol correspondances, we investigated whether Luxembourgish spoken words were globally better represented by one of these languages. Although speech in Luxembourgish is frequently interspersed with French words, forced alignments on these data showed a clear preference for Germanic acoustic models with only a limited usage of French. German models provided the best match with 54% of the data, 35% for English and only 11% for French models. A set of multilingual acoustic models, estimated the pooled German, French, and English audio data, captured 27% to 48% of the data depending on conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Initializing acoustic phone models of under-resourced languages: a case-study of Luxembourgish

The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe’s under-described and under-resourced languages. In this contribution we report on our ongoing work to take Luxembourgish on board as an e-language : an electronically searchable spoken language. More specifically, we focus on the issue of producing acoustic seed models for Luxem...

متن کامل

Studying Luxembourgish Phonetics via Multilingual Forced Alignments

Luxembourgish, a Germanic-Franconian language, is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s under-described languages. This paper investigates the similarity between Luxembourgish phone segments with German, French and English via forced speech alignment techniques. Making use of monolingual acoustic seed models from these...

متن کامل

A first LVCSR system for Luxembourgish, an under-resourced European language

Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s under-described languages. We describe our efforts in building an large vocabulary ASR system for such a “minority” language (target language: Luxembourgish) without any transcribed audio training data. Instead, acoustic models are derived from major languages (sou...

متن کامل

Speech alignment and recognition experiments for Luxembourgish

Luxembourgish, embedded in a multilingual context on the divide between Romance and Germanic cultures, remains one of Europe’s under-described languages. In this paper, we propose to study acoustic similarities between Luxembourgish and major contact languages (German, French, English) with the help of automatic speech alignment and recognition systems. Experiments were run using monolingual ac...

متن کامل

Automatic language identity tagging on word and sentence-level in multilingual text sources: a case-study on Luxembourgish

Luxembourgish, embedded in a multilingual context on the divide between Romance and Germanic cultures, remains one of Europe’s under-described languages. This is due to the fact that the written production remains relatively low, and linguistic knowledge and resources, such as lexica and pronunciation dictionaries, are sparse. The speakers or writers will frequently switch between Luxembourgish...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010